Skip to content

feat: add health check endpoint to CLI serve#1100

Open
markstur wants to merge 5 commits into
generative-computing:mainfrom
markstur:health
Open

feat: add health check endpoint to CLI serve#1100
markstur wants to merge 5 commits into
generative-computing:mainfrom
markstur:health

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented May 19, 2026

Pull Request

Issue

Fixes #1066

Description

Adds /health endpoint to the FastAPI server for Kubernetes liveness probes.

  • /health: always returns 200 (liveness check)

Future readiness check (i.e., /ready) deferred to be done in a separate PR.

Assisted-by: IBM Bob

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

  • Component
  • Requirement
  • Sampling Strategy
  • Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

Adds /health and /ready endpoints to the FastAPI server for Kubernetes liveness and readiness probes.

- /health: always returns 200 (liveness check)
- /ready: returns 200 when ready, 503 otherwise (readiness check)

The readiness check is basic check that run_server happened which provides the
chat endpoint.  In the future this could be extended to let serve modules report
readiness of their backends (etc, needs some design). Would also need to be adapted
appropriately when we add support for multiple serve modules.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Assisted-by: IBM Bob
@markstur markstur requested a review from a team as a code owner May 19, 2026 19:17
@markstur markstur requested review from jakelorocco and planetf1 May 19, 2026 19:17
@github-actions github-actions Bot added the enhancement New feature or request label May 19, 2026
@markstur
Copy link
Copy Markdown
Contributor Author

Did a little bit of import cleanup (unused or moved to top) which was not related to the PR but I think is good to do while touching the files vs the overhead of a nit PR. I'm open to splitting if preferred.

Comment thread cli/serve/app.py Outdated
if _server_ready:
return {"status": "ready"}
else:
raise HTTPException(status_code=503, detail="Server not ready")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an exception? I don't think anything on our side is actually causing failures. I think it should just be a response with the 503 status code and same detail / status message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uvicorn handles this the way you want it to

Comment thread cli/serve/app.py Outdated
Comment on lines 326 to 342
@@ -295,5 +334,9 @@ def run_server(
methods=["POST"],
response_model=ChatCompletion | OpenAIErrorResponse,
)

# Mark server as ready after route is successfully registered
_server_ready = True

typer.echo(f"Serving {route_path} at http://{host}:{port}")
uvicorn.run(app, host=host, port=port)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to even hit the readiness endpoint before the server is ready since the fastapi app isn't run with uvicorn until after the module is loaded and the route added?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will remove /ready for now. That's what I get for sneaking in a bit extra.

FYI --

I cannot hit the not-ready state with the expected run_server() use and since I put _server_ready=True in run_server() it's not very usable any other way right now. I didn't realize that even a breakpoint() would not make this work.

Technically, however, I can run the server w/o using run_server and I get healthy but not ready:
uv run python -c 'from cli.serve.app import app; import uvicorn; uvicorn.run(app, host="0.0.0.0", port="8080")' <-- but don't care about that right now.

I will add an issue to implement /ready later. It should:

1 - handle shutdown signals for k8s!
2 - be for future use (speculative, but if we want to allow module(s) to have startup or other ready dependencies)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree that this is a good improvement to have. I think it just requires changing how the server setup happens so that the server can start up even while the things it relies on (the chat endpoint) is still loading.

* changed status value to "pass" because that is IETF standard.  k8s doesn't care what the string is.
* removing the /ready implementation which is not ready and not part of this issue

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

NOTE: I also changed the status return json to {"status": "pass"} because I learned that is IETF standard while k8s doesn't care what the string is. pass, ok, healthy are common, but pass s most standard.

@markstur markstur requested a review from jakelorocco May 21, 2026 00:43
@markstur markstur enabled auto-merge May 21, 2026 00:43
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title and description say this adds both a liveness probe (/health) and a readiness probe (/ready), but the diff only includes /health. Marking as changes requested since it's incomplete relative to what's described — happy to approve once either the /ready endpoint is added or the PR scope is narrowed to health-only.

Comment thread cli/serve/app.py Outdated


@app.get("/health", status_code=200)
async def health_check():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return type — validation_exception_handler below has one, so this reads a bit inconsistent. Easy fix:

Suggested change
async def health_check():
async def health_check() -> dict[str, str]:

Comment thread test/cli/test_serve.py Outdated

def test_health_check(self):
"""Test that /health endpoint returns 200 with correct JSON response."""
client = TestClient(app)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the other test classes here use fixtures or class-level setup rather than constructing the client inline, but this is totally fine for a single test. Just worth a comment so the next person adding tests knows this is intentional:

Suggested change
client = TestClient(app)
# /health is registered at module-load time — TestClient(app) is correct here
client = TestClient(app)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a 2nd test now (thanks), so I added a fixture and this suggested comment is in the fixture

Comment thread test/cli/test_serve.py
response = client.get("/health")

assert response.status_code == 200
assert response.json() == {"status": "pass"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: one edge case worth pinning while you're here — what does POST /health return? FastAPI will give 405 by default, which is the right behaviour (a misconfigured probe fails loudly rather than silently passing). A quick check:

    def test_health_check_rejects_post(self):
        client = TestClient(app)
        assert client.post("/health").status_code == 405

Comment thread cli/serve/app.py Outdated
)


@app.get("/health", status_code=200)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: status_code=200 is FastAPI's default for GET so it's a no-op — the code works correctly either way. Just flagging in case you want to trim it:

Suggested change
@app.get("/health", status_code=200)
@app.get("/health")

markstur added 2 commits May 21, 2026 11:44
* removing the redundant explicit setting of status code

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
* GET is success 200
* POST is 405 (just reinforcing with a test)

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur changed the title feat: add health and readiness check endpoints to CLI serve API feat: add health check endpoint to CLI serve May 21, 2026
Misc improvements from PR review:
* fixture
* comment
* return type
* add test for POST

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from planetf1 May 21, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli/serve Health check endpoint. GET /health returning {"status": "pass"}.

3 participants